⚡ Model Efficiency - jimman · Scour

Determining Energy Efficiency Sweet Spots in Production LLM Inference

arxiv.org·2d

⚡LLM Optimization

Show HN: Model Training Memory Simulator

czheo.github.io·6h·

Discuss: Hacker News

✍️Prompt Engineering

Is Your Machine Learning Pipeline as Efficient as it Could Be?

kdnuggets.com·2d

⚡LLM Optimization

Zero-Latency Local AI: Tuning Your Linux Kernel for LLM Inference 🐧🧠

dev.to·1d·

Discuss: DEV

⚡LLM Optimization

Heterogeneous Processing: A Strategy for Augmenting Moore's Law (2006)

linuxjournal.com·2h·

Discuss: Hacker News

⚡LLM Optimization

How I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

mohammedeabdelaziz.github.io·1d·

Discuss: Hacker News

⚡LLM Optimization

Understanding LLM Inference Engines: Inside Nano-vLLM (Part 2)

neutree.ai·2d·

Discuss: Hacker News

⚡LLM Optimization

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

github.com·1d·

Discuss: Hacker News, r/LocalLLaMA

⚡LLM Optimization

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

news.ycombinator.com·4h·

Discuss: Hacker News

✍️Prompt Engineering

Optimized LLM Inference Engines

rishirajacharya.com·4d

⚡LLM Optimization

KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs

arxiv.org·2d

⚡LLM Optimization

Fast Autoscheduling for Sparse ML Frameworks

ajroot.pl·3d·

Discuss: Hacker News, r/Compilers

⚡LLM Optimization

GPU for AI Training: Powering the Next Generation of Intelligent Systems

dev.to·1d·

Discuss: DEV

🔍AI Interpretability

Creeping memory allocation

community.folivora.ai·8h

Quantization-Aware Distillation

ternarysearch.blogspot.com·13h·

Discuss: Hacker News

⚡LLM Optimization

Building Highly Efficient Inference System for Recommenders Using PyTorch

pytorch.org·2d·

Discuss: Hacker News

⚡LLM Optimization

Deep dive into Hierarchical Navigable Small Worlds

amandeepsp.github.io·16h·

Discuss: Hacker News, r/Zig, r/programming

⚡LLM Optimization

Examining Turbopuffer ANN v3

terencezl.github.io·3d·

Discuss: Hacker News

⚡LLM Optimization

Why Files Are Not Enough as Memory for AI Agents

medium.com·5h·

Discuss: Hacker News

⚡LLM Optimization

Human-like Search for Modern Applications

anvitra.ai·13h·

Discuss: Hacker News

⚡LLM Optimization

Loading more...